Links

Are the 1000 genomes variants in dbSNP?

Answer:

The 1000 Genomes Project SNPs and short indels were all submitted to dbSNP and longer structural variants to the DGVa.

Where possible, release VCF files contain the appropriate IDs in the ID column, such as dbSNP rs IDs.

The archives contain variants discovered by the final phase of the 1000 Genomes Project (phase 3) and also by the preliminary pilot and phase 1 stages of the project. As methods were developed during the project, phase 3 represents the final data set.

Related questions:

Are all the variants displayed on the 1000 Genomes Project Browser discovered by the project?

Answer:

No, not all the variants in the browsers produced by the 1000 Genomes Project were discovered by the 1000 Genomes Project.

The data from the 1000 Genomes Project is available in a number of browsers, including browsers produced by the 1000 Genomes Project, which reflect the major data releases associated with the pilot, phase 1 and phase 3 publications from the 1000 Genomes Project. More information on this is available on the browsers page.

The content of the 1000 Genomes Project Browsers, maintained during the 1000 Genomes Project, are based on custom versions of the Ensembl browser. These databases contain the Ensembl core features (genes and transcripts), regulatory elements from the Ensembl Regulatory Build and variation data from the Ensembl Variation database.

As well as 1000 Genomes Project variation data, Ensembl variation contains data from dbSNP, ClinVar, COSMIC, dbGaP, dbVAR, EGA and many other sources.

Related questions:

Can I find the genomic position for a list of dbSNP rs numbers?

Answer:

This can be done using Ensembl’s Biomart.

This YouTube video gives a tutorial on how to do it.

The basic steps are:

  1. Select the Ensembl Variation Database
  2. Select the Homo sapiens Short Variants (SNPs and indels excluding flagged variants) dataset
  3. Select the Filters menu from the left hand side
  4. Expand the General Variant Filters section
  5. Check the Filter by Variant Name (e.g. rs123, CM000001) [Max 500 advised] box
  6. Add your list of rs numbers to the box or browse for a file which contains this list
  7. Click on the Results Button in the headline section
  8. This should provide you with a table of results which you can also download in Excel or CSV format

If you would like the coordinates on GRCh38, you should use the main Ensembl site, however if you would like the coordinates on GRCh37, you should use the dedicated GRCh37 site.

Related questions:

Why isn't my SNP in browser.1000genomes.org?

Answer:

Ensembl and UCSC Genome Browser both import their variant data from dbSNP. When new 1000 Genomes variants have been released it can take some time for them to be accessioned by dbSNP and make their way to the browsers.

When this happens we try to ensure there is a version of our own browser which displays the data in the meantime. Both Ensembl and UCSC support attaching VCF files to them for visualisation

Related questions:

Why isn't a SNP in dbSNP or HapMap?

Answer:

The 1000 Genomes Project submits all its variants to archives like dbSNP or the DGVa. If it hasn’t yet made it to dbSNP this means it is likely to be a new site which we haven’t yet submitted. There may also be some old sites which we subsequently discover to be false discoveries which we then suppress.

As far as our overlap with the HapMap site list goes, The majority of HapMap SNPs are found in the 1000 Genomes Project, there will be a small number of sites we fail to find using next generation sequencing but most sites from HapMap which aren’t found by the 1000 Genomes Project will be false discoveries by HapMap. There are a lot of SNPs from the 1000 Genomes Project and other next generation sequencing projects which won’t be part of HapMap as HapMap is based on an older genotyping technology when such rapid variant discovery using sequencing was not possible.

Related questions: